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ABSTRACT 

Emotion is one of the maximum essential details which determines in 
predicting the human nature and information the human behaviour. 
Though it is an easy task for human being for recognizing human’s 
emotion but it is not the same for a computer to understand. And so 
let research is being conducted to predict the behaviour correctly with 
higher precision and accuracy. 


This paper demonstrates the real time facial emotion recognition in 
one of the seven categories 0 emotion that are angry, disgust, fear, 
happy, neutral, sad and surprise. We are using a simple 4-layer 
Convolution Neural Network (CNN). We also have implemented 
various filter and pre-processing to remove the noise and also have 
taken care of over-fitting the curve. We have tried to improve the 
accuracy o model by applying various filters and optimizing the data 
for feature extraction and obtaining the accurate data prediction. The 
dataset used for testing and training is FER2013 and the proposed 
trained model gives an accuracy of about 73%. Keyword: Emotion 
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1. INTRODUCTION 

Emotion constitutes an important part in processing 
human behaviour. Understand human behaviour and 
predicting it can revolutionize the business model of 
our society. The ability to understand this emotion 
will play a role in understanding non-verbal 
communication. The makes use of emotion popularity 
is limitless, think about a case whilst a dealer will 
without problems realize whether or not the consumer 
appreciated the product or now no longer or with the 
aid of using how a whole lot have he appreciated it. 
This has a massive marketplace capacity to be 
discovered. Not only this is also having huge 
potential in security, robotics, surveillance, 
marketing, industries and a lot. 


It may be very smooth for a human to recognize 
other’s emotion via way of means of searching at his 
face, the mind robotically does the work, however the 
case isn't for a device to carry out. It wants to do 
numerous calculations and carry out numerous 
algorithms and optimize numerous records units to 
teach the model. 


terms of the Creative Commons 
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In recent year scientists have developed various 
algorithm like K nearest neighbour (KNN), Decision 
Tree (DT), 


Probabilistic neural network (PNN), Random Forest, 
Support Vector Machine, Convolution Neural 
Network (CNN) etc. 


In this paper we have use four convolution layers 
with rectified Linear Unit (ReLU) as activation 
function. The proposal model undergoes a series of 
pre-processing and feature detection and feature 
extraction using techniques such as HaarCascade. We 
have sorted over-becoming the version through 
growing out after each layer. It may be very smooth 
for a human to recognize other’s emotion via way of 
means of searching at his face, the mind robotically 
does the work, however the case isn't for a device to 
carry out. It wants to do numerous calculations and 
carry out numerous algorithms and optimize 
numerous records units to teach the model. In recent 
year scientists have developed various algorithm like 
K-nearest neighbour (KNN), Decision Tree (DT), 
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Probabilistic neural network (PNN), Random Forest, 
Support Vector Machine, Convolution Neural 
Network (CNN) etc. 


This paper purpose a CNN architecture because it has 
shown better results in contrast to different algorithms 
in area of emotion popularity with more accuracy an 
precision. 


In this paper we have use four convolution layers 
with rectified Linear Unit (ReLU) as activation 


I. LITERATURE SURVEY 


function. The proposal model undergoes a series of 
pre-processing and feature detection and feature 
extraction using techniques 


such as Haar Cascade. We have taken care of over- 
fitting the model by developing out after every layer. 
And the proposed version is skilled with FER2013 
statistics set and the beat software program rating out 
of seven emotion expression as bathe as discover 
result. 


Various deep gaining knowledge of and system gaining knowledge of setoff rules are being applied and 
numerous peoples are running on unique algorithm.[36][37][38][39] 


Dates Member Name problem Possible Solution Reference 
Year Description 
Harms Hs Pesane : Faster R-CNN The school of Electrical 
Zhang, Jingjing Facial L : : : : 
: (Faster Regions with Engineering and Automation 
2017 | Zhang, Jun Zhang, | Expression : nae eae : 
Tene Li. YiXi : hae Convolution Neural Anhui University, Hefei 
eee aaa Networks Features) 230601, China 
Qing Yan, Lina Xun 
Harbin Institute of 
Age ats Technology, Harbin 
Xuan Liu, Junbao Li, | Gender D-CNN BY: 
: . \ 150080,China 
2017 | Cong Hu, Jeng- Classification | (Deep Convolutional - <r 
Shyang Pan with Facial Neural Networks) Fue d nvety oF 
yang Technology, Fuzhou 350108, 
Image : 
China 
Raghav Puri, Mohit . Using Python (version | Electronics & 
14th- es . Emotion ae ; : 
Tiwari, Archit J 2.7) and Open Source Communication Engineering 
16th sat Detection age ire : 
March Gupta, Nitish Lite Image Computer Vision Bharati Vidyapeeth’s College 
’ | Pathak, Manas Sikri, : Library (Open CV) and_ | of Engineering New Delhi, 
2018 : Processing ; 
Shivendra Goel numpy India 
oe College of Information 
Research on (Fisher Convolutional ; : . 
. . Science and Engineering 
Liu Hui, face Neural Networks) ete: : 
2018 ‘s hes Wuhan University of Science 
Song Yu-jie recognition P-SVM 
: and Technology Wuhan, 
algorithm (Profile Support Vector : 
; China 
Machine) 
ern ee : Using Convolutional Computer Vision Lab, TU 
2013 | Pramerdorfer, Expression : : é 
: ee Neural Networks Wien Vienna, Austria 
Martin Kampel recognition 
Facial College of Information 
2020 Huibai Wang, Pa cian The Fusion of CNN and | Science and Technology 
Siyang Hou oe ition SIFT Features North China University of 
en Technology Beijing, China 
: : School of Electronics and 
Chen Jia, Facial . : ; . 
Nitigs : Ensemble learning of Information Engineering, 
2020 | Chu Li Li, expression ee ated" 
Zhou Yin i nition CNNs Liaoning university of 
eae cee technology JinZhou, China 


Wt. DATASET 


The data set used for training is FER2013, which is open-source dataset contains 25,887 48X48 pixel grayscale 
images of different emotion into seven categories that are angry, disgust, fear, joy(happy), neutral, sad and 
surprise respectively. The CSV contain 2 columns in which the first columns contain the emotion cable from 0-6 
and the second columns contains string surrounded in quote. The string pixel value of the image. 
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"Fi. 1. a saben form FER2013 data set 


> The FER2013 dataset is divided into two directories 

ie., 1. train, 2. test 
> Each of them consists of seven sub-directories which are further divide into seven sub categories. 
> Each subdirectory contains images of specific expressions taken from various sources 


IV. METHODOLOGY 
Now we discuss various methods which we have used for predicting the emotion. 


Prog Faned mode! 


Fig. 2. Face detection and Feature extraction 


We went through several steps to extract data and find faces and then run them through the trained model, which 
is based on the CNN architecture.[12] 


A. Face and Feature Detection 

This is one of the early stages of image processing where we break the video into pictures and then process 
image by image and try to detect face and if multiple face are found it will work on that also. Before face and 
Feature extraction we are resizing the frames and converting the image in 48x48 pixel and convert into 
greyscale. Then we are using OpenCV for Face detection.[9][10][11][13] 


Date & Problem 


Member Name es Possible Solution Reference 
Year Description 
Lutfiah Zahara, Facial Emotion Using the 
Purnawarman Musa, | Recognition (FER- | Convolutional Department of Computer 
30th Eri Prasetyo 2013) Dataset for | Neural Network Science Gunadarma 
May,2021 | Wibowo, Irwan Prediction System | (CNN) Algorithm | University Depok, 
Karim, Saiful Bahri | of Micro- based Raspberry __| Indonesia 
Musa Expressions Face | Pi 
Xuefeng Liu, Feature Extraction College eon S 
eco . ; 3D- CNN Electronic Engineering, 
Qiaoqiao Sun, Yue | and Classification : : ‘ae 
25th-27th Viens Cannean sp Eecnecal (3D-Convolution | Qingdao University of 
May, 2018 8 onecons YPersp Neural Network) | Science and Technology, 
Wang , Min Fu Image ‘ ; 
Qingdao, China 
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Yi Dian, Shi Using a Deep pane in menue 
; Dropout Method of : University china, 
Xiaohong, ne Convolution : 
2018 Xu Hao Face Recognition Neusi Need Shanghai 
1241975543@qq.com 
Gu Shengtao, : : Global and Local | School of Electronics and 
Facial expression ‘ : : : 
2019 Xu Chao, resoonition feature fusion Information Engineering, 
Feng Bo with CNNs AnHui University 
Kewen Yan , eee School of Automation, 
Shaohui Huang, Face Recsonigoa Gack Bon Hangzhou Dianzi 
26th-28th | Yaoxian Song, Wei e Nieuai aes a University, HangZhou 
July, 2017 | Liul , Neng Fan 310018 
Ahmed Ali Faculty of Computer 
Mohammed Al- Systems and Software 
Saffar, Hai Tao, Image Deep Convolution | Engineering University 
2017 Mohammed Ahmed | Classification Neural Network Malaysia Pahang Pahang, 
Talab Malaysia 
; “Gheorghe Asachi” 
George (Ost Technical University, Iasi 
Porusniuc, Florin Architectures for CNNs eonsanin Unive . a ° 
21st-23rd_ | Leon, Radu Facial Expression | (Convolutional ee 
: : ae Eastern Finland, Joensuu, 
November, | Timofte, Casian Recognition Neural Networks) : : 
: Finland , ETH Zurich, 
2019 Miron : ; 
Zurich, Switzerland 


For this, we use the Harr Cascade classifier from OpenCV. The classifier is quite effective and works flawlessly 
which was proposed by Paul Viola and Michael Jones in 2001. .[5][6][7][8] 


B. Feature Extraction 

In this step, the maximum 8 critical components of the face are extracted and cut the eyebrows, eyes, nose, chin, 
mouth and jaw and are used and optimized for greater precision. The extracted data is saved in numpy format. 
for example, in Fig. 2. green part of the face is extracted and is cropped and it is stored in numpy format and 
later on it is passed to the ANN layers for extraction and processing.[20][24][26][27][29] 


C. CNN architecture 

The next step is to develop the cape and for that we got used to CNN. In deep learning, a convolutional neural 
community (CNN, or ConvNet) is a category of synthetic neural community, maximum usually implemented to 
research visible imagery. Convolutional neural community consists of a couple of constructing blocks, inclusive 
of convolution layers, pooling layers, and completely related layers, and is designed to analyse spatial 
hierarchies of functions robotically and adaptively via a backpropagation algorithm. It was first proposed by a 
scientist Yann LeCunn who was inspired by the way humans could the encircling [31 ][32][35] and understand 
them. CNNs have proved itself to have greater success in the research area of Facial Emotion Recognition (FER) 
because they could perform feature extraction and image simultaneously with high precision, making it the ideal 
methodology for image the classification.[14][15][16][17][18][21] 


V. IMPLEMENTATION 

We have Trained the model on Train data set available in the FER2013 i.e., 28709 in numbers and for testing 
purpose we have reserved 7178 pictures which again is in FER2013 in Test sub folder. All the images are of 
48x48 pixels and are grayscale and are in PNG format.[19][22][23][25][27] 


Emotion’s 
classified into 


1.Anger 
2.Disgust 
3.Fear 
4.Happy 
5.Sad 
6.Surprise 
7.Neutral 


Face 
recognition 
using Haar 
Cascade 


CNN 
architecture 


Feature 
extraction 


Samael _ 
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There are two steps in this proposed model. The first part involves processing the image and extracting the faces 
using the Harr cascade as is shown in Fig. 3. and the image is and the image is scaled down to 48x48 pixels, 
otherwise it is Converted to grayscale. It is then passed to CNN architecture which is our second module. 


Our CNN architecture consists of five convolution layer and uses reLU as the activation function. Each layer 
uses a filter of 1,32,64,128, 128 respectively with a 3x3 kernel matrix. Each convolution layer is saved in 3x3 
matrix and dot product is calculated after which is handed to max_ pooling which converts 3x3 matrix into 2x2 
and then to mange the over-fitting the model 0.25 of the data is eliminated and again and then once again to 
max_pooling. After that once again process is repeated and finally all layers are flattened and a hidden dense 
layer of 1024 nodes is created. Dropout of 0.50 50 is done and another output will classify the photo into these 
seven categories. The proposed model using five dense layer is procreated having softmax as the activation 
function with seven output which layer of convolution neural network and many complex neurons produces an 
accuracy of 63% on this data set. The illustration of the above implementation is shown in Fig. 4. Various thing 
like the input and output of various layers is shown along with the batch i.e., how many image will it process at a 
given time and the output layer is also shown. The model was later tested for various epochs and efficiency is 
tested at various epochs and what we found was the accuracy stops increasing after about 20 epochs as shown in 


graph [40] 


[(None, 48, 48, 1)] 
output: | [(None, 48, 48, 1)] | 


conv2d_4_ input: InputLayer 


input: (None, 48, 48, 1) 
conv2d_4: Conv2D 
output: | (None, 46, 46, 32) 


input: | (None, 46, 46, 32) | 


(None, 44, 44, 64) 


conv2d_5: Conv2D 


output: 


(None, 44, 44, 64) 
(None, 22, 22, 64) 


max_pooling2d_3: MaxPooling2D ‘nik 
output: 


| input: | (None, 22, 22, 64) 
(None, 22, 22, 64) 


dropout_3: Dropout 


(None, 22, 22, 64) 


conv2d_6: Conv2D 
output: (None, 20, 20, 128) | 


input: | (None, 20, 20, 128) 


max_pooling2d_4: MaxPooling2D 
output: | (None, 10, 10, 128) 


(None, 10, 10, 128) | 
(None, 8, 8, 128) 


conv2d_7: Conv2D 


input: | (None, 8, 8, 128) 


max_pooling2d_5: MaxPooling2D 
output: | (None, 4, 4, 128) 


| input: | (None, 4, 4, 128) | 


(None, 4, 4, 128) 


dropout_4: Dropout 


(None, 4, 4, 128) 


flatten_1: Flatten 
output: | (None, 2048) | 


input: | (None, 2048) 


dense_2: Dense 
output: | (None, 1024) 


(None, 1024) 
output: | (None, 1024) 


dropout_5: Dropout 


input: | (None, 1024) 


dense_3: Dense 
output: (None, 7) 
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tag: epoch_accuracy 
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Fig 5 Epoch vs accuracy graph for 10 epoch 


epoch_accuracy 
tag: epoch_accuracy 


60 70 80 90 100 


Fig 6 epoch vs accuracy for 100 epochs 


VI. CONCLUSION 
We have tried implementing the CNN and various 
pre-processing algorithms and have _ reached 


efficiency and accuracy of more than 63 percent on 
this FER2013 dataset this in itself is difficult and we 
can also try to improve and adjust the set of rules to 
achieve better precision. For testing purpose, we have 
taken 100 images randomly from each of the 
expression’s test sub folder and passed the image 
through the predicting model and if the model 
predicts correctly, accuracy counter is increased. So 
after doing the experiment on 700 images taken 
randomly and evenly form different data set we 
correctly predicted 443 out of 700 image which offer 
the accuracy of 63.2 %. 
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